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Abstract 

Most image labeling problems such as segmenta- 
tion and image reconstruction are fundamentally 
ill-posed and suffer from ambiguities and noise. 
Higher order image priors encode high level struc- 
tural dependencies between pixels and are key to 
overcoming these problems. However, these pri- 
ors in general lead to computationally intractable 
models. This paper addresses the problem of dis- 
covering compact representations of higher order 
priors which allow efficient inference. We propose 
a framework for solving this problem which uses 
a recently proposed representation of higher or- 
der functions where they are encoded as lower en- 
velopes of linear functions. Maximum a Poste- 
rior inference on our learned models reduces to 
minimizing a pairwise function of discrete vari- 
ables, which can be done approximately using 
standard methods. Although this is a primarily 
theoretical paper, we also demonstrate the prac- 
tical effectiveness of our framework on the prob- 
lem of learning a shape prior for image segmenta- 
tion and reconstruction. We show that our frame- 
work can learn a compact representation that ap- 
proximates a prior that encourages low curvature 
shapes. We evaluate the approximation accuracy, 
discuss properties of the trained model, and show 
various results for shape inpainting and image seg- 
mentation. 

1 Introduction 

Most computer vision problems can be formulated 
in terms of estimating the values of hidden vari- 
ables from a given set of observations. In such a 
setting, probabilistic models are applied to repre- 
sent the prior knowledge about hidden variables 
and their statistical relationship with observed 
variables. 

A number of models encoding prior knowledge 
about scenes have been proposed in computer vi- 
sion. The most popular ones have been in the 
form of a pairwise Markov Random Field (MRF). 
A random field is a strictly positive probability 
distribution of a collection of random variables. 
Markov Random Field (MRF) additionally satis- 
fies some (or none) Markov (conditional indepen- 
dence) properties [ ]. An important characteristic 




(a) Incomplete Image (b) Pairwise Result 




(c) Our Result (d) Final Inpainting 

Figure 1: (a) Input image (area for completion of 
starfish is shown in blue), (h) The starfish was in- 
teractively segmented from the image. Then the 
three arms of the starfish, which touch the im- 
age borders, were completed with an 8-connected 
pairwise MRF which encodes a standard length 
prior. Note, with this prior no pixels in the 
blue completion area were assigned to the starfish 
arms, (c) Completion of the shape of the three 
starfish arms was done with our compact-higher- 
order prior which models curvature, (d) Finally, 
texture was added fully automatically using [ ]. 

of an MRF is the factorization of the distribution 
into a product of factors. Pairwise MRFs can be 
written as a product of factors defined over two 
variables at a time. For discrete variables, this en- 
ables non-parametric representation of factors and 
the use of efficient optimization algorithms for ap- 
proximate inference of the Maximum-a-Posteriori 
(MAP) solution. However, because of their re- 
stricted pairwise form, the model is not able to 
encode many types of powerful structural prop- 
erties of images. Curvature is one such property 
which is known to be extremely helpful for inpaint- 
ing (see figure 1), segmentation, and many other 
related problems. 

Higher-order Priors There has been a lot of 
research into priors based on high-level structural 
dependencies between pixels such as curvature. 
These priors can be represented in the probabilis- 
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Figure 2: (a) A given cost function for curvature, which we want to approximate, (b) Our method 
learns a large set of 'soft' pattern-based potentials which implicitly model the curvature cost. Our learned 
MRF model has a higher-order 8x8 pattern-based potential at each pixel-location. Our model implicitly 
selects at each position the best-fitting pattern for a labeling. Intuitively, a pattern fits well (has low 
cost) if all foreground pixels match to blue pattern weights and all background pixels match to red pattern 
weights. Pixels which match to green pattern weights do not contribute and may be either background 
or foreground. The last two patterns encode the fact that if the higher-order potential is defined on top 
of a non-boundary location (all 4 center pixels are foreground or background), then the curvature cost 
is 0, i.e. is ignored, (c) An example demonstrating the curvature cost computed by our pattern based 
approximation at different parts of the object boundary. Circle radius correspond to the assigned cost. 



tic model using factors which may depend on more 
than two variables at a time. The largest num- 
ber of variables in a factor is called the order of 
the probabilistic model. Higher-order factors de- 
fined on discrete variables are computationally ex- 
pensive to represent. In fact, the memory and 
time complexity for inferring the MAP solution 
with general inference algorithms grows exponen- 
tially with the order, and thus has limited the use 
of such models. The situation is a bit different 
for parametric models with continuous variables. 
Higher-order prior models such as Product of Ex- 
perts (PoE) [ ] or Field of Experts (FoE) [ ] are 
differentiable in both parameters and hidden vari- 
ables. These models thus enable inference using 
local gradient descent, and have led to impressive 
results for problems such as image restoration and 
optical flow. 

Recent research on discrete higher-order models 
has focused on identifying families of higher-order 
factors which allow efficient inference. The factors 
can be categorized into 3 broad categories: (a) Re- 
ducable factors^ which allow MAP inference to be 
reduced to the problem of minimizing a pairwise 
energy function of discrete variables with the ad- 
dition of some auxiliary variables [5, 6, 7, 8, 9], 



(b) Message-enabled factors^ which allow efficient 
message computation and thus allow inference us- 
ing message passing methods such as Belief Prop- 
agation (BP) and Tree Reweighted message pass- 
ing (TRW) [xu, XX, ], and (c) Constraint factors^ 
which impose global constraints that can be im- 
posed efficiently in a relaxation framework [13, 14]. 

Pattern-based Representation Pattern and 
lower-envelope based representations proposed 
in [ , , ] can represent some families of Reduca- 
ble factors. The higher-order potentials of [11, 9] 
are defined by enumerating important configura- 
tions (patterns) in a local window. The model 
of [ ] additionally enables deviations from encoded 
patterns, by using linear weighting functions. The 
above models are generalized by the representa- 
tion proposed in [ ] which encodes higher-order 
functions as lower (or upper) envelopes of linear 
(modular) functions of the variables. The com- 
plexity of representing and performing inference 
depends on the number of linear functions (or pat- 
terns) used for representing the higher-order fac- 
tor. A number of higher-order priors can be en- 
coded using few linear functions (or patterns) and 
thus allow efficient inference. However, the use 
of a general higher-order prior would require ex- 
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ponential (in the order of the factor) number of 
hnear functions (or patterns). 

Our Contribution This paper addresses the 
problem of discovering a compact representation 
of a general higher-order factor. More specifically, 
given a particular higher-order factor, we try to 
find the linear-envelope representation which best 
approximates it. Given a set of training examples 
of labeling and their corresponding desired costs, 
we find parameters of a linear-envelope represen- 
tation that matches these costs. While the prob- 
lem is difficult, we propose a simple yet effective 
algorithm. 

We demonstrate the efficacy of our method 
on the problem of finding a compact 'curvature 
prior' for object boundaries. This prior encourages 
smooth boundaries by assigning a high cost to high 
curvature shape and a low cost of low curvature 
shapes. Given a set of training shapes, we find pa- 
rameters of a linear-envelope representation that 
closely match their pre-specified curvature based 
cost^. Figure 2 illustrates our discovered model. 
We then use the discovered higher-order factors in 
the problems of object segmentation and comple- 
tion. The experimental results demonstrate that 
incorporation of these priors leads to much better 
results than those obtained using low-order (pair- 
wise MRF) based models (see figure 1) and other 
state-of-the-art curvature formulations. 

An outline of the paper follows. In section 
2, we provide the notation, define the higher- 
order model, and explain the lower- envelope rep- 
resentation of higher-order factors. Section 3 re- 
views research on using curvature priors for label- 
ing problems. Section 4 explains how we learn a 
lower-envelop representation of a curvature based 
higher-order prior model. Section 5 discusses the 
techniques we used to perform MAP inference in 
the pairwise model corresponding to the discov- 
ered higher-order model. Section 6 describes our 
experimental setup and provides the results. We 
conclude by summarizing our framework and list- 

^The purpose of this exercise is only to demonstrate the 
power of our representation scheme. A more difficult prob- 
lem would be to learn the linear envelope model in an unsu- 
pervised way from examples of segmentations. We do not 
address this problem in this paper. 



ing some directions for future work in Section 7. 

2 Higher-order Model Representation 

We consider a set of pixels V = {1 . . . Nx} x 
{1 . . . Ny} and a binary set of labels C = {0, 1}, 
where 1 means that a pixel belongs to the fore- 
ground (shape) and to the background. Let 
x: V ^ £ be the labeling for all pixels with in- 
dividual components denoted by x^, G V. Fur- 
thermore, let V(h) C V denote a square window 
of size K X K at location h, and U is the set of all 
window locations. Windows are located densely 
in all pixels. More precisely, all possible K x K 
windows are considered which are fully inside the 
2D-grid V (fig. 3 illustrates boundary locations). 
Let ^v{h) • ^(^) ~^ ^ denote a restriction of label- 
ing X to the subset V{h). 

We consider distribution of the form p(x) oc 
exp — £'(x) with the following energy function: 

£;(x) = 0y{Xy) + Ouv{Xu, Xy) + Y ^h{^)^ 

vev uves heu 

(1) 

where notation uv stands for ordered pair {u,v)^ 
Oy-. C ^ R and 9uv : ^ R are unary and pair- 
wise terms, £ C Vx V is a set of pairwise terms and 
are higher-order terms. We consider the higher 
order terms of the following form (equivalent 
to [9]) 

Eh{^) = nnn (^(w^, xy(/,)) + c^) . (2) 

This term is the minimum (lower envelope) of 
several modular functions of ^v{h)' We call this 
model a maxture by the analogy with the mixture 
model discussed below. We refer to individual lin- 
ear functions (wy,xy(/j)) + as "soft" patterns^. 
Here G is a weight vector and G M is a 
constant term for the pattern. Vector w^; is of the 
same size as the labeling patch :siv(h) it can be 
visualized as an image (see fig. 2(b)). The variable 
y G P is called a pattern-switching variable. It is 
a discrete variable from the set P = {0, ...,Np}. 

^The name "soft" refers to the fact that weights Wy can 
take arbitrary values. This is in contrast to other models, 
[9, 1 ], which constrain the weights Wy to certain values, as 
discussed later. 
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We let the pattern which corresponds to ^ = 
have the associated weights wq = 0. This pattern 
assigns a constant value cq to all labelings ^v{h) 
and it ensures that £'/i(x) < cq for all x. It will be 
needed to express models [9, 11] in the form of (2), 
as these models explicitly define a cut-off value. It 
is also used in our curvature model, where it repre- 
sents the maximal cost f^^^ of the curvature cost 
function. 

The minimization problem of energy (1) ex- 
presses as 



mm 



^o(x) + X] nun (^{wy, Xy(/,)) + . 



(3) 



where unary and pairwise terms are collected into 
Eq. The problem can also be written as a mini- 
mization of a pairwise energy 



min [£;o(x) + ^(w^^,xy(^) 



Vhj ' 



(4) 



where y :U ^ P is the concatenated vector of all 
pattern switching variables^. Clearly, problem (4) 
is a minimization of a pairwise energy function of 
discrete variables x, y. 

Problem (4) is NP-hard in general. A sub- 
class ["] where minimization of (1) is solvable in 
polynomial time is described in Appendix A. How- 
ever, this class is very restrictive and is not suit- 
able for our purpose. In the general case a num- 
ber of approximate MAP inference techniques for 
pairwise energies can be used, as discussed in sec. 
5. 

2.1 Pattern-based model 

Let us relate the above model to the "hard" 
pattern-based model defined in [ ] (and also spe- 
cial case in [ ]). In [ ] a potential of the following 
form is used: 



Cy if3y e {l...Np} xv(ft) =p^ 
Co otherwise 



(5) 

This potential assigns cost Cy if the labeling 
matches exactly pattern G jC^^ for some y G 



^We refer to components of y by i/h, while y usually 
denotes an independent bound variable. 




{1 . . . Np} and cost cq if none of the patterns are 
matched. The set of labels for this model is not 
necessarily binary. For binary labels (5) can be 
rewritten in the form (2) by setting 



(6) 



where S is a sufficiently large constant. This 
is a restricted model since deviations from the 
"hard" patterns are not allowed, in contrast to our 
model (2). However, this restricted model allows 
for an alternative optimization approach which 
was proposed in [ ] and seems to correspond to 
a tighter relaxation than the standard relaxation 
for the pairwise model (4). Also, the hard pattern 
potential model [ ] is too restrictive in the fol- 
lowing sense. Function (2), as well as the special 
case (5), can exactly represent an arbitrary func- 
tion of discrete variables :>^v(h) if we allow 2l^(^)l 
patterns. Obviously, in practice if is large 

such an approach becomes computationally infea- 
sible. The challenge is therefore to define a good 
model with a small number of patterns, for which 
case model (2) seems clearly a better choice. 

2.2 Relation to Mixture Model 

Minimization of a pairwise energy with auxiliary 
variables (4) can be interpreted as MAP infer- 
ence in a pairwise MRF. The question arise then: 
why do we talk about higher order terms, rather 
than just introducing more hidden variables in a 
pairwise model? To answer this question we need 
to look at the associated probability distributions 
and the estimation problems. Here, for simplic- 
ity, we ignore unary and pairwise terms of £"0 and 
also ignore the observable variables. Let y be aux- 
iliary hidden variables in the following, new pair- 
wise model 



p(x,y) oc Yl exp(-(w^^,xy(/,)) 



(7) 
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The model of x is then imphed to be 

p^^^(x) oc ^ J]^ exp(-(w^^,xy(/,)) - C^J 
yepu heu 

heUyeP 

(8) 

It can be seen that factors in this model are mix- 
tures of exponential distributions. The problem of 
MAP inference of x, taking the logarithm, can be 
written as 

argmax^0\-(w^,xy(/,)) - Cy), (9) 

h yeP 

where we define log-sum-exp operation as 

a ©^6= ^log(e^^ + e^^), (10) 

which is commutative and associative binary op- 
eration, so ^ay is also unambiguously defined. 

yeP 

The problem (9) is a discrete optimization with 
difficult objective, so by introducing auxiliary hid- 
den variables we arrived at a complicated inference 
problem. A heuristic could be used to infer x by 
solving the joint MAP in x and y and then to 
discard the estimate of y, this would lead to the 
optimization problem of the desired form: 

argmin^((w^^,xy(/,)) + c^J. (11) 
""'^ h 

So we could have learned a mixture model and 
made a wrong use of it by replacing inference 
with (11). We prefer instead to state the model as 

p^^^(x) = rrmaxexp(-(w^,xy(^)) - c^), (12) 
, yeP 
n 

where the factors are maxtures of exponential dis- 
tributions. Clearly, the model corresponds to (4) 
by rewriting it as 

= exp I - ^ min((w^, xy(/,)) + c^) }. (13) 
h 

The MAP inference of x in this model directly cor- 
respond to (11). So this is a much cleaner corre- 
spondence of the model to the estimation problem. 



In fact, there is a smooth transition between the 
two models. It is know that lim^^o^ 
max(a,6), which says that as distributions get 
sharper, their mixture turns into a maximum. It 
is also easy to see that under this limit the model 
p™^ transits to p^^^ and so is the corresponding 
MAP X problem. 

3 Curvature priors for Image Labeling 

We will evaluate the usefulness of our curvature 
prior on the closely related problems of image seg- 
mentation and shape inpainting. Given an image 
region with a lack of observations, a good segmen- 
tation model should complete the segmentation in 
this region from the evidence outside of the re- 
gion. This shape inpainting problem is related to 
inpainting of binary images which has been ap- 
proached in the continuous setting with several 
curvature-related functionals [L5, 16]^. 

Image labeling with curvature regularization is 
an important topic of research, and both continu- 
ous and discrete formulations for the problem have 
been proposed. Continuous formulations offer ac- 
curate models, but they rely on numerical schemes 
which have to deal with highly nonlinear functions 
and need a good initialization to converge to a 
good local minimum of the cost functional^. Dis- 
crete methods for image labeling with curvature 
regularization build on quantization and enumer- 
ation of boundary elements. Until recently, they 
were applied only in restricted scenarios where it 
is possible to reduce the problem to a search of 
the minimal path or minimum ratio cycle 
These cases enjoy global optimality, however they 
restrict the topology or do not allow for arbitrary 
regional terms. 

Recently, [ ] proposed a discrete method for 
a general setting. This method is able to find 
globally optimal solutions for difficult segmenta- 
tion problems. They formulate the problem as In- 
teger Linear Programming (ILP), where variables 
are indicators of edge and region elements, while 

^ There is a vast literature on the general image inpaint- 
ing problem, however these techniques, especially exemplar- 
based ones, do not extend to image segmentation problem, 
and are not relevant in the context of this paper. 

^For instance, [ ] works with discretized Euler- 
Lagrange equations of the 4th order. 
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(b) 



(c) 



Figure 3: (a) Continuous shape and its discretiza- 
tion. Fihed circles show boundary locations. 
Larger blue window illustrates V(h) at location h. 
(b,c) Foreground and background patterns: green 



w. 



y,v 



0, red: Wy^y = +B, blue: Wy^y 



constants Cy are —AB and +45 respectively. 



constraints make sure that these variables are con- 
sistent and do correspond to a shape. However, 
this method quantizes the directions of boundary 
elements, which, as we show in the experiment sec- 
tion, may result in large errors in the final segmen- 
tation. The complexity of the model [ ] grows 
very fast with the number of directions, and it 
is not entirely clear how to build a cell complex 
with the required properties for more directions 
than [ ] considers. 

A recent work [ ] claims to give fast optimal so- 
lution for curvature regularization. However, their 
model is a crude approximation to the curvature 
functional. Its 4-neighborhood variant essentially 
penalizes the number of "corners" in the segmen- 
tation - locations where a 2x2 window has 3 pix- 
els foreground and 1 background or vice-versa. It 
assigns zero penalty for horizontal and vertical 
boundaries but diagonal boundaries have maxi- 
mal penalty. The 8-neighborhood variant allows 
for diagonal lines at zero cost but penalizes ver- 
tical and horizontal lines. Our model with 2x2 
windows can also implement the 4-neighborhood 
model [20]. However, as we argue, a larger window 
is necessary to capture the curvature of a shape 
represented by binary pixel labeling. 
4 Learning a Curvature Cost Model 
Suppose we are given a shape 5 C such that 
we can calculate the curvature at every point 
of the boundary, dS. Let /(/^) > be a curva- 
ture cost function, which defines a desired penalty 
on curvature, in this paper we consider /(a^) = 
miii{hi^ ^ f^^^) . Let the total cost of the shape be 
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(b) (c) 

Figure 4: Problem definition and motiva- 
tion of large-sized windows. Examples above 
show discrete labelings on a pixel grid with a cor- 
responding red continuous curve. Note, there are 
infinitely many continuous curves which give rise 
to the same discrete labelling - two examples are 
given in (a) and (b). The red curve in (b) is prob- 
ably the one with lowest curvature given the dis- 
crete labelling. Our goal is to find an energy func- 
tion which maps every discrete labelling to the cor- 
responding cost of the continuous curvature with 
lowest curvature, (c) makes the important point 
that larger sized windows have inherently a better 
chance of predicting well the curvature at the cen- 
ter of the window. In (c) the green window is of 
size 3x3, while in (b) it is of size 5x5. The un- 
derlying discrete labelling is identical in both cases 
and the red curve is the optimal (lowest curvature) 
continuous curve given the window. The crucial 
point is that the curvature of the continuous curve, 
at the center of the window, is very different in (b) 
and (c). Note, this problem is to some extend miti- 
gated by the fact that the total cost of segmentation 
is the sum of costs along the boundary. 

jQ^f(K,)dl. Our goal is to approximate this inte- 
gral by the sum 



(14) 



where functions operate over a discretized rep- 
resentation of the shape, x, and are of the form (2) 
with weights w, c. Here w and c denote the con- 
catenated vectors of all weights w^ and c^, respec- 
tively. The learning problem is to determine the 
pattern weights w, c such that the approximation 
is most accurate. Since the mapping of continuous 
to discrete curves is a many-to-one mapping, we 
further formalize our exact goal in figure 4. In the 
figure we also motivate the important aspect that 
larger windows are potentially superior. 
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We first restrict the sum in (14) to take into 
account only boundary locations. We call h a 
boundary location for shape x if the 2x2 window 
at h contains some pixels which are labeled fore- 
ground as well as some pixels which are labeled 
background, as illustrated in fig. 3. We constrain 
all soft patterns to be non- negative {{wy, x)+Cy > 
0) and introduce two special patterns (fig. 3b, c), 
which have cost for locations where the 2x2 
window at location h contains only background 
or foreground pixels. These patterns make £'/i(x) 
vanish over all non-boundary locations, therefore 
such locations do not contribute to the sum (14). 
The learning task is now to determine £'/j(x), such 
that at each boundary location the true cost f{hi) 
is approximated. In this way (14) does correspond 
to the desired integral if we were to neglect the 
fact that the number of boundary locations does 
only approximate the true length of the boundary. 
Note, the number of boundary locations does cor- 
respond to the "Manhattan" length of the bound- 
ary. We will come back to this problem in sec. 6. 

Point-wise learning procedure. Let us as- 
sume that in a local KxK window, shapes of 
low curvature can be well-approximated by sim- 
ple quadratic curves^. The idea is to take many 
examples of such shapes and fit Eh(x.) to approx- 
imate their cost. We consider many quadratic 
shapes {S^)fLi the window KxK and derive 
their corresponding discretization on the pixel grid 
(x^)^-^. Each continuous shape has an associated 
curvature cost P = /(^^ at the central boundary 
location. We formulate the learning problem as 
minimization of the average approximation error: 

argminw,c Ei \Eh{^') - /i, 

rwo = 0,co = /-^^ (15) 
[Eh{x) > Vx 

where the first constraint represents the special 
implicit pattern (wq^cq = /"^^^), which ensures 
that Ehi'x.) < f^^^. The second constraint makes 
sure that cost is non-negative. It is important 

^Note, based on our definition in fig. 4 we select 
quadratic curves which are likely to be the ones of lowest 
curvature (among all curves) for the corresponding discrete 
labelling. 



for the following reason: the formulation of the 
approximation problem does not explicitly take 
into account "negative samples", i.e. labellings 
which do not originate from smooth curves, and 
which must have high cost in the model. How- 
ever, requiring that all possible negative samples 
in a KxK window have high cost would make 
the problem too constrained. The introduced non- 
negativity constraint is tractable and not too re- 
strictive. This problem appears difficult, since 
Eh((x}) is itself a concave function in the param- 
eters w,c. We approach (15) by a k-means like 
procedure with a specially constructed initializa- 
tion: 

Alg. 1. Iterative Factor Discovery 
Input: x% w, c 

Repeat till convergence or maximum itera- 
tions: 

1. For all training images i find best matching 
patterns y'^ — argmin[(wy, x^) + Cy] 

2. For dl\ y . . .Np refit (w^, Cy)\ 
{^Ny,Cy)^ argmin ^ |(w^, x^) + - f | 

The refitting step (16) is a linear optimiza- 
tion which can be solved exactly. The constraint 
in (16) is an equivalent representation of the con- 
straint (wy,x) + > Vx, imposed by (15). 

The initialization and results of applying this 
learning procedure are discussed in the sec. 6. 
5 Inference 

We examined several standard MAP inference 
techniques for pairwise MRFs, and found the fol- 
lowing two-stage procedure to work best for our 
problem. First, the TRW-S algorithm[21, 22] is 
run for a fixed number of iterations. Then an 
initial solution is obtained by rounding the tree 
min-marginals. Second, a Block-ICM procedure 
improves on this initial solution. 
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5.1 TRW-S 

In contrast to many other pairwise MRFs encoun- 
tered in computer vision, our model has a very 
large number of pairwise links. We developed 
the following memory-efficient implementation of 
TRW-S for models with pairwise interactions be- 
tween variables with few states and variables with 
many states. For an image of size NyxNx and 
model with Np patterns of size KxK^ there are in 
total 0{NyNxK'^) pairwise terms needed to rep- 
resent the pattern potentials (pairwise energy (4) 
can be illustrated as a bipartite graph in fig. 5(a)). 
The algorithm in [ ] needs to keep a message 
(vector of size Np) for each edge. This makes 
the original procedure [^x] extremely memory in- 
tensive. Since each pixel has only two labels and 
there are Np labels for the pattern switching vari- 
able, we can improve on memory requirement with 
a little computation overhead. We make the fol- 
lowing modification to the implementation ["^1 , 
fig. 3]. We store reparametrized unaries, Os(xs) 
and Ohivh)^ {0{NyNxNp) memory) and messages 
only in the direction patterns^pixels, mhv{xy), 
{0{NyNxK'^) memory). When the reverse mes- 

yi y2 vs ruyhivh) 

Xi X2 Xs X4 Xy O— O Vh 

(a) (b) 

Figure 5: (a) Graphical model for the energy (4). 
(b) Pairwise terms connecting pixel labeling 
and pattern-switching variable y^. Circles show 
possible states: two states for and Np states 
for Vh- 

sage is requested by the algorithm it is computed 
on the fly using the equation 

myhiVh) = min [-fstOs{xs)-mhv{xv) + Oyh{xy,yh)], 

Xy 

(17) 

which is 0{Np) computations. To completely 
specify the algorithm we have to choose the or- 
dering of variables and the parameters 7. We 



can specify the ordering corresponding to longer 
chains, which potentially provide a faster conver- 
gence or the ordering where all x precede all 
corresponding to short 1-edge chains, which makes 
the computation paralelizable. The parameters 7 
are selected following the recommendation in [21]. 

This modified version of TRW-S requires only 
0{NyNx {K'^ + ^p)) memory and runs in 
0{NyNxK'^Np) time per full-pass iteration. In 
practice this results in about 5 seconds per itera- 
tion for an image of size 158 x 128, however a signif- 
icant number of iterations was required to archive 
good results (e.g. up to 4000), which is a known 
issue for dense graphs [23]. 

5.2 Block-ICM 

It often happens that the tree min-marginals of 
TRW-S for some pixels are "indecisive", i.e. pos- 
sibly many different labellings have a low energy. 
In this case the solution picked by our pixel- 
independent rounding schemes may be rather 
poor. We found that further local improvements 
can significantly decrease the energy. Block-ICM 
tries to improve the current labeling x by switch- 
ing states of a small block of k variables at a 
time. Obviously, its complexity grows exponen- 
tially in A:, hence k must be low (we use k=6). 
During Block-ICM, the blocks are selected densely 
around the current boundary of x. For an image 
of size 158x128 Block-ICM needs about 3 minutes 
to converge. 

5.3 LP relaxation 

TRW-S is a suboptimal dual solver for linear pro- 
gram relaxation of the discrete pairwise energy 
minimization. The relaxation (see e.g. [^^] for an 
overview) is obtained by linearizing objective (4) 
and dropping the integrality constraints. In par- 
ticular, it replaces binary variables Xy G {0,1} 
with relaxed variables Xy G [0,1]. The optimal 
relaxed labeling may be of interest even when the 
relaxation is not tight. However, TRW-S does not 
compute the primal relaxed labeling and it may 
get stuck in a suboptimal point. On the other 
hand solving the full primal LP seems infeasible 
since it requires 0{NyNxK^Np) variables. The 
following heuristic can be applied. Note, although, 
as we will see, the results may be improved with 
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Figure 6: Restricted LP. (a) Unaries: black - 
foreground, white - background, gray - area to 
be inpainted; (b) Tree min-marginals of TRW-S: 
a thin hne shows the 0-level contour; (c) Round- 
ing of TRW-S solution; (d) The reduced problem; 
(e) Relaxed primal solution of the reduced prob- 
lem (with 0-level contour); (f) Rounding of relaxed 
primal solution. We also show lower bound (LB) 
for relaxation and energie (E) for a discrete solu- 
tion. Note, the discrete, ground truth circle has 
an energy of E = 1.96, which is still lower than 
the best solution found (f). 

this heuristic, we did not use it in our experimen- 
tal section due to its computational complexity. 
The procedure is that we use the dual solution 
of TRW-S to greedily fix a larger part of the pri- 
mal relaxed variables (both corresponding to pixel 
labels X and patterns y). When the tree min- 
marginal for a label in a pixel is above a threshold 
compared to the minimal tree min-marginal in the 
pixel, we fix the corresponding primal variable to 
and eliminate it. This gives a restricted linear 
program, which then can be solved by a primal 
method. Example in fig. 6 illustrates these steps 
for a circle inpainting problem of the type fig. 8. 
We make two observations: first, the restricted LP 
attains optimality in a "fractional" relaxed solu- 
tion, so there exist an integrality gap and we can 
not obtain an optimal discrete solution. Second, 
the value of the objective at the optimum of re- 
stricted LP is different from the lower bound by 



TRW-S, which means that TRW-S has converged 
to a sub-optimal dual solution. 

Let us also briefly mention on the performance 
of Belief Propagation (BP), which we found infe- 
rior. We used the variant called sequential (min- 
sum) Belief Propagation (BP), obtained from 
TRW-S by setting all 7 to 1 as described in [21]. 
In [9] it was reported that BP performs best 
for texture denoising with soft pattern-potentials, 
while TRW-S performed poorly. For our model 
we observed the opposite: BP may produce very 
poor results, especially for problems where there 
is a large area of uncertainty (shape-inpainting) . 
Note, we also tried dumping and different re- 
ordering heuristics for BP, but without success. 
We believe that an interesting direction for future 
work is to thoroughly compare various optimiza- 
tion schema for various types of soft pattern-based 
potentials. 
6 Experiments 

We performed several kinds of experiments. 

1. We discuss the learning procedure of our 
model and investigate the approximation quality 
it achieves. For the later we generated continuous 
shapes, for which the true total cost can be com- 
puted precisely, and compared that to our model. 
Note, to make inference and learning feasible we 
can only use a restricted number of patterns with 
limited size. We show that this gives a reason- 
able approximation of the desired cost functional. 
Note, in theory, by increasing the resolution (size 
of patterns) and the number of patterns one may 
archive approximation with an arbitrary accuracy. 

2. The second set of experiments studies the task 
of "shape-inpainting" where the optimal segmen- 
tation (shape) has to be inferred, while only a few 
boundary conditions are given. It provides a good 
way of inspecting our prior shape model and as- 
sessing whether it corresponds to our intuitive no- 
tion of natural shapes. 

3. The next experiment is on standard interac- 
tive image segmentation. Here we compare length 
versus curvature regularization. 

4. Finally, we analyze the properties of the curva- 
ture model of [ ] and provide some comparison 
with our model. 
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Figure 7: Cost Approximation, (a) The performance of Alg. 1 measured in terms of the objective (15). 
The red and green dashed curves show training and test error respectively, (b) Point-wise approximation 
cost with initial patterns, and (c) point- wise approximation cost after 10 iterations of Alg 1. In both 
(b,c): each green point is a test sample; the blue line shows the desired true cost; the green line the mean 
approximation cost; and the red lines show 3 x standard deviation bounds. (d,e) Approximate total cost 
/ length vs. true total cost / length for circles (d) and Fourier shapes (e). (f) Examples of training and 
test patches used in (a-c). (g) Examples of discretized shapes for circular shapes and Fourier shapes used 
in (d,e). 



6.1 Cost approximation 

To learn our curvature model we used 96 pat- 
terns of size K = 8. For the learning we sampled 
N = 10000 random quadratic curves passing close 
to the center in a KxK pixel patch (fig. 7(f)). The 
initial model is build as follows. We split the train- 
ing patches into 32 orientations, using the tangent 
of a curve, and also 3 different curvature intervals. 
This gives in total 96 bins. For each bin we fit 
a separate linear function using step 2 of Alg.l, 
which results in an initial set of 96 patterns. We 
then run several iterations of Alg.l. The estimated 
error of the objective (15) is shown in fig. 7(a). We 



see that both training and test error decrease over 
time. 

The initial and final point- wise cost (i.e. for a 
patch) is illustrated in Fig. 7 (b) and (c) respec- 
tively. It can be seen that the mean of the ap- 
proximated cost is very close to the true cost func- 
tion, however, the variance is considerably large. 
However, this problem should be mitigated when 
the cost is summed up along the full boundary, 
as the errors average out. To verify this, we sam- 
pled larger shapes for which we can compute the 
true total cost exactly and then compared to that 
of our model. Fig. 7(d,e) shows this experiment 
for two classes of shapes: (d) circular shapes of 
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Figure 8: Inpainting of a corner and a circle. The 
green boxes show the area to be inpainted, where 
the size in pixels of the length of green box is be- 
low the figures. Pixels in gray show the estimated 
solution. Note, the boundary conditions are differ- 
ent: right-angle boundary condition (top), circle 
boundary condition (bottom). 

size 100x100 with random radius (uniformly sam- 
pled in [5 50] pixels) and subpixel shift; (e) com- 
plex shapes created using Fourier series p(a) = 
ao + Ylk=i sin(A:Q:) + bk cos{ka) in polar coor- 
dinates with random coefficients (a, 6). Deriva- 
tives, curvature and the total cost integral can be 
computed accurately for these shapes. We then 
measure the approximation error relative to the 
true length of the curve. Figure 7(d,e) shows that 
the variance is reduced, especially for shapes with 
low average curvature which are pre-dominant. 
The plots do also reveal the fact that we consis- 
tently overestimate the true curvature cost. This 
problem is related to the fact that we approxi- 
mate the integral along the boundary by the sum 
over boundary locations, which corresponds to the 
"Manhattan" length, which is usually higher than 
the Euclidean length. While this is not essential 
for getting a useful model for the shape prior, we 
discuss in Appendix B a way of reducing this error 
by adjusting the pattern costs. 

6.2 Shape Inpainting 

The goal is to reconstruct the full shape, while 
only some parts of the shape are visible to the al- 
gorithm. This is a useful test to inspect our shape 
prior. Let F C V be the set of pixels restricted to 
foreground (shape) and B cV pixels restricted to 
background. The unary terms of (1), 6y{xy), are 
set to oo if label Xy contradicts to the constrains 
and otherwise. This ensures that the correct 
segmentation is inferred in the region F U B. 
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Figure 9: Inpainting with a straight line bound- 
ary condition. The green box is of size 8, 16 and 
24 pixels respectively from top to bottom. The 
numbers show the model cost of the estimated so- 
lution (top line) and the cost of the ground truth 
(discretized) straight line (bottom line). 

In the unknown region V\(F U B) all unaries 
are exactly 0. Fig. 8, 9 show results for different 
inpainting problems with various boundary condi- 
tions corresponding to inpainting of some simple 
shapes. The main conclusion is that all results 
look reasonably good. Note, for a tiny circle in 
fig. 8 (bottom left) the reconstruction looks more 
like an oval than a circle. This is an expected re- 
sult, and visually acceptable, since the boundary 
condition (black pixels) may correspond to either 
of the shapes and an oval has a lower cost. Also, 
we see that for some line inpainting examples in 
fig. 9 the result deviates slightly from the ground 
truth (straight line). Given that the cost of our so- 
lution is almost always lower than the cost of the 
ground truth, it is quite likely that the problem is 
due to a non-perfect model and not due to a local 
minimum of the optimization. As mentioned be- 
fore, one way to overcome this problem is to allow 
for more patterns. We also tested our algorithm 
on inpainting real world images, and compared its 
results with those obtained by using a pairwise 
Markov Random Field formulation that tries to 
reduce the boundary length. The results can be 
seen in figures 10 and 1. It can be seen that the 
higher-order model that encodes curvature pro- 
duces shape completions with smooth boundaries. 
An example combining curvature and length pri- 
ors is shown in fig. 11. 
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Figure 10: Two example for automatic shape com- 
pletions of an occluded object. In both cases the 
left result is with a pure curvature prior and the 
right result with a pure length prior (8- connected). 
Note, the yellow curve (and a part of the green 
curve) indicate the original user-defined segmen- 
tation. Then the user defines the green area. In- 
side the green area, the method automatically finds 
the shape completion (blue curve). 



(a) (b) (c) 

Figure 11: Combining length and curvature for 
inpainting: (a) pure curvature, (b) curvature + 
length, (c) curvature + more length. 

6.3 Image Segmentation 

We use a simple model for the task of interac- 
tive FG/BG image segmentation, similar to [ ]. 
Based on the user brush strokes (fig. 12(a)) we 
compute likelihoods using a Gaussian mixture 
model (GMM) with 10 components. The differ- 
ence of the unaries 0^(1) — Oy{0) correspond to 
the negative log-likelihood ratio of foreground and 
background. Fig. 12(e) shows results when using a 
simple pairwise MRF (8-connectivity), which puts 
a prior on the length of the boundary. By vary- 
ing the strength of the prior we achieve various 
results, however, none of the results is satisfying. 
Note, the length prior is, in contrast to [25], not 
gradient-sensitive since the legs of the giraffe do 
not have an edge with sharp contrast. Results for 
our curvature model for various strengths of the 
prior are shown in fig. 12(f). Note, no additional 
length prior is added. We clearly see that the cur- 
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Figure 12: Image segmentation, (a) Image 
with foreground (green) and background (blue) 
seeds; (b) Color based unary potential costs 
(red implies foreground-favoring, blue implies 
background-favoring), (c) Detail of segmentation 
result from [ ] (top) and our result (f,100) (d) 
Detail of segmentation [ ] and our. (e) Segmen- 
tation with length prior (8-connected model) for 
various strength of the prior (numbers below fig- 
ure), (f) Segmentation with curvature prior (our 
model) for various strength of the prior. 

vature prior is able to properly segment the legs 
of the giraffe, compared to the length prior. In- 
creasing the strength of the prior above some limit 
(1000) has almost no effect on the smoothness of 
the solution, because each local 8x8 window is al- 
ready maximally smooth according to the model. 
Note, that our result, e.g. fig.l2(f,100), is visually 
superior to [ ], fig. 12(c), despite the fact that 
we use a grid with much coarser resolution (see 
detailed discussion below). 

6.4 Analysis of the model of Schonemann 
et al. [ ] 

The main property of the model of [ ] is that 
there is a pre-defined set of quantized directions. 
For our analysis, we considered a restricted sce- 
nario (Fig. 13) where it is evident that the opti- 
mal shape has to be described by a path in the 
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Figure 13: Properties of [19]. In (a,b) the prob- 
lem is to find the optimal shape with lowest cur- 
vature given the boundary conditions: the 2 end- 
points in (a) and 2 terminal edge-elements in (b). 
In (a,b) the black line shows an optimal solution 
of model [ ]. The blue line in (a) is the closest 
approximation of a straight line. Note, in both 
cases (a,b) the model [ ] has multiple solutions: 

(a) two solution with each having one corner point; 

(b) a family of optimal solutions with each having 
4 corner points, (c) Inpaiting results by our model 
for the same problem as in (a). 

graph. We used a 16-connected graph. For two 
consequent edge elements of the boundary, we ap- 
proximated the squared curvature as A^^-j^j^, [ o] 
(functional G2), where A is the angle between the 
line segments and /i, I2 their lengths. This is sim- 
ilar to [ ] but a symmetric form. Another (not 
essential) difference of our re-implemtation of [ ] 
is that they construct edge elements by subdivid- 
ing each pixel, whereas we model edge element 
by end-points in a discrete grid (where edges can 
also intersect). Fig. 13 reveals the problem of dis- 
critized directions. We observed that lines at di- 
rections which are not perfectly modeled in [ ] 
(e.g. the line at 1/4 slope in fig. 13(a)) have a 
very large approximation error. Indeed, the best 
approximation to the line in fig. 13(a) has many 
small "steps", whereas the optimal boundary in 
the model of [ ] makes only one large "step" . We 
believe that this effect is the reason for the visual 
artifacts in the segmentation result of the giraffe 
in fig. 12(c), where the legs are approximated with 
a few straight lines. 




(a) (b) 

Figure 14: (a) Segmentation of the giraffe's head, 
where we show the zero level line of tree min- 
marginals. The curvature model smoothes out the 
giraff 's ear. (b) Segmentation result with one pat- 
tern added (shown in the corner). 

6.5 Generic Patterns 

As a demonstration of the extendibility of our 
model we made the following simple experiment. 
The giraff's ear is smoothed out by our model 
(fig. 14(a)), since it is of high curvature and has 
weak support in the color model. We included one 
additional pattern which fits well to the ear. As 
all other patterns, this new pattern is available 
in all locations. The segmentation of the head, 
fig. 14(b), clearly improves around the ear. 

7 Conclusions and Discussion 

This paper shows how to compute compact rep- 
resentations of higher order priors which enable 
the use of standard algorithms for MAP infer- 
ence. We have demonstrated our method on 
the problem of learning a 'curvature-based' shape 
prior for image inpainting and segmentation. Our 
higher-order shape prior operates on a large set 
(neighbourhhod) of pixels and is less sensitive to 
discretization artifacts. The applicability of our 
method is not limited to 2D image segmentation 
and inpainting; it could also be used for 3D com- 
pletion. More generally, it can be used to obtain 
tractable representations for higher order priors 
for general labelling problems such as optical flow, 
stereo, and image restoration. 

It would be interesting to extend our approach 
to incorporate other types of local shape proper- 
ties, not necessarily defined by an analytic func- 
tion but perhaps by exemplars. Such a general- 
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ization would very likely require a more general 
learning technique, which is an interesting direc- 
tion for future research. 

A Solvable subclass 

Here we give a short review of the result in [6]. 
We show that a higher-order term of binary vari- 
ables which is a composition of a piecewise-linear 
concave R ^ R function and a modular function 
with non-negative coefficients can be represented 
as min-projections of submodular quadratic func- 
tions. In the context of model (1), each higher- 
order term (2) must be of the form 

Eh{x) = mm(a^(/, :siv(h)) + Cy), (18) 

where / G R:^^ and a and c are arbitrary. In this 
case problem (19), where pairwise terms are sub- 
modular as well, is solvable in polynomial time. 
Note, obviously one of the key differences to our 
general model is that the weight vector in (2) 
can have arbitrary entries for different and of 
different signs for a single y. This is a crucial com- 
ponent for modeling some aspects, e.g. curvature, 
of the boundary of a segmentation. 
Consider the problem 

min L4(x) + P(x) + G(x) , (19) 

where A(x) is a linear term: A(x) = X^sgV ^^^^ 
and P(x) is quadratic and submodular. We are 
seeking for functions G which can be represented 
as 

min \B{x) + C{y) + Q{x, y) + D], (20) 
yG{o,i}^y 

where B and C are linear and Q is quadratic sub- 
modular. In this case problem (19) would reduce 
to minimization of a submodular quadratic func- 
tion, which is easily solvable by max-flow/min-cut. 

Functions suggested by [ ] may be described as 
follows. A function of the form 

G{x) = min(0, L{x) + D), (21) 

where L{x) = J2s ^sXg has all coefficients Ig non- 
positive and G R can be represented as 

G{x) — min yi(L(x) + D) 
2/ie{o,i} 

rv- 1 (22) 

= mm I > ^ IsXsVi + Dyi , 

yiG{0,l}L^ i 



where the quadratic term in (x, y) is submodu- 
lar. In the case when all coefficients Ig are non- 
negative, it can be represented as 

mm (l-yi){L{x) + D) 

2/ie{0,l} 

= min L{x) + + -hxsyi - Dyi , 

yiG{0,l} J 

(23) 

where the quadratic term is again submodu- 
lar. This allows us to represent also function 
mm{L^{x) + D^,L'^{x) + D'^) = mm{L^{x) - 
L^{x) + D^-D^,0) + L^(x) + D^, under the condi- 
tion that coefficients of L^—L^ are all non-positive 
or all non-negative. 

When coefficients of — L? are of indefinite 
sign it seems impossible to represent min(L-'^, L^). 
To give an example, why it is so restrictive, con- 
sider G(xi,X2) = min(xi,X2) This function is not 
submodular, indeed, G(l, 1) + G(0,0) = 1^0 = 
G(1,0) + G(0, 1) and therefore can not be repre- 
sented as min-projection of a submodular one. 

Now, consider a more limited case, when — 
a} IgXg + and — o? IgXg + 6^, where 
Is > 0. Then min(L-'^,L^) is always representable, 
because coefficients — L? are all either positive 
or negative, depending on the sign of q} — o?. It 
is also easy to see that any R ^ R concave piece- 
wise linear function of IgXg can be represented 
as sum of minima where each minimum is of two 
linear functions satisfying the conditions and thus 
it is itself representable. 

B Accounting for overlap 

Let bnd(x) be the set of boundary locations. It is 
easy to see that for a closed discrete contour, the 
number of points on the contour, |bnd(x)| is the 
same as the number of edges in a 4-connected grid 
(i.e. the number of neighboring pixels with differ- 
ent labels). Clearly, if £'/j(x) is approximating the 
cost of the curvature in the neighborhood of loca- 
tions /i, then the sum (14) will be an inaccurate 
approximation of the continuous integral, at least 
because it measures the length in the 4-connected 
graph metric. However, because each 8x8 pat- 
tern actually "sees" a larger neighborhood of the 
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boundary, its weights may be adjusted so that the 
sum of pattern costs approximate the desired in- 
tegral better. For example, weights of the pattern 
matching a diagonal line at 45 degrees may be 
scaled by Because neighboring patterns over- 
lap and because they have to model any arbitrary 
complex boundary, such an adjustment has to be 
done jointly for all patterns. Note, the main focus 
of [ ] was to address learning of overlapping terms, 
in the context of field of experts. 

We now propose a second training method, 
called Algorithm 2, which attempts to deal with 
these problems. The goal is to adjust the pat- 
tern weights such that the total curvature cost of 
a shape is approximated as well as possible. 

In particular, given a set of larger images, in- 
dexed by z, e.g. of size 100 x 100, where each im- 
age depicts a different continuous shapes 5^ with 
discretization x^, and total curvature we now 
formulate the learning problem as: 



X 10"^ 



argmin IS^ Eh{^' 

W,C '^—^ I ^ — ^ 



(24) 



Let us use the same trick as in Alg. 1. 
If we linearize all terms by fixing the 

current best pattern for each location in 
each image, the problem simplifies. We 
therefore can iterate the following two steps: 

Algorithm 2 
Input: x^, w, c 

1. For all training images i find matching patterns 
at all locations h gU: 



yl = aTgmm[ ^ Wy^yxl + Cy] (25) 
vev{h) 



2. Refit all patterns: 



(w, c) = 

arg min ^ {J^h EveV(h) ^yi,v4 + S^) " 
s.t. [ min w^x] + c j > \/j e P . 

xG{0,l}^^ 

(26) 

The constraint in the second step enforces that 
the pattern cost for each arbitrary labeling is at 
least 0, which is the lowest value of the function /. 
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Figure 15: Objective of (24) during iterations of 
Alg. 2. Red curve is evaluated on training data 
and green dashed curve is evaluated on the inde- 
pendent test data. 

The main difference to Alg. 1 is that the refitting 
step does no longer decouple into estimating pat- 
terns weights independently. Step (26) will now 
be solved with an LP. 

We tested Alg. 2 to retrain costs c on the class 
of Fourier shapes (see fig 7g(right)) so as the total 
cost assigned by the model would match better to 
the true cost. It is initialized with patterns (w,c) 
which were learned using Alg. 1. Here we kept 
the weights w fixed and only update the weights 
c in step 2. Figure 15 shows the progress of Alg. 2. 
Figure 16 evaluats the new trained model on inde- 
pendent samples of circles and Fourier shapes. We 
see that the bias of the model to overestimate the 
total cost was reduced, compared to fig. 7(d,e). 
As expected, this is especially true for the Fourier 
shapes, since only Fourier shapes were used in 
training. Note, since retraining with Alg 2 gave 
only a mild improvement in approximation error, 
we did not use these patterns, but rather the ones 
obtained after Alg. 1 for all of our experiments. 
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Figure 16: Results after applying Alg. 2 to refit 
constant terms. We show approximate total cost 
/ length vs. true total cost / length, for circles 
(a) and Fourier shapes (b) respectively. As can be 
seen teh bias is removed 
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